ADF Data Cube Ontology v1.5.3 RF

The Allotrope Data Format (ADF) [[!ADF]] consists of several APIs, taxonomies and ontologies. This document describes the Allotrope Data Format Data Cube Ontology (ADF-DCO) which allows to describe the structure and content of n-dimensional data. ADF-DCO is based on the RDF Data Cube Vocabulary (QB)[[!QB]]. ADF-DCO extends QB by concepts for complex data types, data selections, scales, order functions, indexes and HDF mappings.

Disclaimer

THESE MATERIALS ARE PROVIDED "AS IS" AND ALLOTROPE EXPRESSLY DISCLAIMS ALL WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, INCLUDING, WITHOUT LIMITATION, THE WARRANTIES OF NON-INFRINGEMENT, TITLE, MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.

This document is part of a set of specifications on the Allotrope Data Format (ADF)[[!ADF]]


Introduction

The Allotrope Data Format (ADF) defines an interface for storing scientific observations from analytical chemistry. It is intended for long-term stability of archived analytical data and fast real-time access to it. The ADF Data Cube API (ADF-DC) defines an interface for storing raw analytical data. ADF-DCO uses the RDF Data Cube Vocabulary [[!QB]] and maps the abstract data cubes defined by the terms of the data cube ontology to their concrete HDF5 representations in the ADF file. The structure and metadata of HDF5 objects is described by an HDF5 ontology, which is based on the HDF5 specifications.

This document is structured as follows: First, the role of the ADF Data Cube API within the high-level structure of ADF [[!ADF]] API stack is presented. Then, the requirements for the ADF Data Cube Ontology are described, and an overview of the structure of ADF-DCO and the relations to QB are summarized. Then, the concept of primitive and complex data types is explained and illustrated along example representations before the main section presents the ADF-DCO concept details starting with the extension to QB component specifications and the introduction of scales and order functions. Finally, subsetting of data cubes and HDF5 mappings are described.

ADF-DCO will be published under http://purl.allotrope.org/ontologies/datacube

Document Conventions

Naming Conventions

The IRI of an entity has two parts: the namespace and the local identifier. Within one RDF document the namespace might be associated by a shorter prefix. For instance the namespace IRI http://www.w3.org/2002/07/owl# is commonly associated with the prefix owl: and one can write owl:Class instead of the full IRI http://www.w3.org/2002/07/owl#Class.

Within the biomedical domain the local identifier is often an alphanumeric ID which is not human readable. The Allotrope Foundation Taxonomies follow this approach, e.g. a process is represented as af-p:AFP_0001617. To enhance readability within this document, the preferred label from the ontology or taxonomy is used for the corresponding entity. I.e., instead of af-p:AFP_0001617 the corresponding entity is named as af-p:process. If the namespace is clear by the context the prefix MAY be omitted and the entity is named simply process. If the label contains spaces, the entity MAY be surrounded by Guillemets to avoid ambiguities, e.g. «af-p:experimental method».

Namespaces

Within this specification, the following namespace prefix bindings are used:

Prefix Namespace
owl:http://www.w3.org/2002/07/owl#
rdf:http://www.w3.org/1999/02/22-rdf-syntax-ns#
rdfs:http://www.w3.org/2000/01/rdf-schema#
xsd:http://www.w3.org/2001/XMLSchema#
skos:http://www.w3.org/2004/02/skos/core#
dct:http://purl.org/dc/terms/
foaf:http://xmlns.com/foaf/0.1/
obo:http://purl.obolibrary.org/obo/
qudt:http://qudt.org/schema/qudt#
qudt-unit:http://qudt.org/vocab/unit#
qudt-quantity:http://qudt.org/vocab/quantity#
qb:http://purl.org/linked-data/cube#
af-c:http://purl.allotrope.org/ontologies/common#
af-m:http://purl.allotrope.org/ontologies/material#
af-e:http://purl.allotrope.org/ontologies/equipment#
af-p:http://purl.allotrope.org/ontologies/process#
af-r:http://purl.allotrope.org/ontologies/result#
af-x:http://purl.allotrope.org/ontologies/property#
adf-dp:http://purl.allotrope.org/ontologies/datapackage#
adf-dc:http://purl.allotrope.org/ontologies/datacube#
adf-dc-hdf:http://purl.allotrope.org/ontologies/datacube-hdf-map#
hdf:http://purl.allotrope.org/ontologies/hdf5/1.8#
afs-qudt:http://purl.allotrope.org/shapes/qudt#
afs-dc:http://purl.allotrope.org/shapes/datacube#
afs-dc-hdf:http://purl.allotrope.org/shapes/datacube-hdf-map#
afs-hdf:http://purl.allotrope.org/shapes/hdf#
ex:http://example.com/ns#

Indication of Requirement Levels

Within this document the definitions of MUST, SHOULD and MAY are used as defined in [[!rfc2119]].

Number Formatting

Within this document, decimal numbers will use a dot "." as the decimal mark.

ADF High-Level Structure

The following figure illustrates the high-level structure of the Allotrope Data Format (ADF) API stack:

The high-level structure of the Allotrope Data Format (ADF) API stack.

This document focuses on the ADF Data Cube Ontology, which is used by the ADF Data Cube API [[ADF-DC]] highlighted in the figure above.

Requirements

The ADF Data Cube Ontology (ADF-DCO) provides a data model for the structure of n-dimensional data and subsets thereof. Specifications of the meta-data of any n-dimensional data structures covered by Allotrope Foundation Ontologies (AFO) MUST be possible. Thus, the key requirements regarding of the ADF Data Cube Ontology are the following: ADF-DCO MUST provide the following means:

ADF Data Cube Ontology High-Level Structure

The following figure illustrates the high-level structure of the ADF Data Cube (ADF-DC) API with the ADF Data Cube Ontology and its components.

ADF Data Cube Ontology high-level structure.

ADF-DCO imports and thus extends the RDF Data Cube Vocabulary (QB). The ADF-DC API is based on the vocabulary and data structures of ADF-DCO.

The RDF Data Cube Vocabulary

The following figure illustrates the high-level structure of the RDF Data Cube Vocabulary (QB) [[!QB]]:

The RDF Data Cube Vocabulary by W3C

In QB the central classes are qb:DataSet, qb:DataStructureDefinition, qb:ComponentSpecification and qb:Slice. A qb:DataStructureDefinition defines the different components of a data set and thus provides the structure for the observations contained in a qb:Dataset. A qb:DataStructureDefinition is reusable by many qb:Datasets. The ADF Data Cube Ontology (ADF-DCO) extends the RDF Data Cube Vocabulary by classes and properties for representation of complex data types, scales, order functions and data selections.

High-Level Class Relations

The following figure illustrates the relation between high-level classes of the ADF Data Cube Ontology and their relations to classes from QB.

High-level relations between classes of the ADF Data Cube Ontology and classes of the RDF Data Cube Vocabulary (QB).

ADF-DCO extends QB ontology by several concepts. The QB classes qb:DataSet, qb:DataStructureDefinition and qb:ComponentSpecification remain central in ADF-DCO - however extension for selections are defined in a parallel structure. E.g., ADF-DCO defines DataSelection, a corresponding SelectionStructureDefinition and a ComponentSelection. The details of the extensions are described in the next section.

ADF Data Cube Ontology Concept Details

The general schema of a qb:DataStructureDefinition is illustrated in the [[QB]] schema above. ADF-DCO makes several extensions to the basic schema defined by [[QB]] in order to allow efficient storage of observation data in [[HDF5]]. These extensions are described in the following subsections.

Primitive and Complex Data Types

The data types associated with the component of a qb:DataSet can be either primitive or complex.

Primitive Data Types

Primitive data types encompass all valid primitive RDF types (for details see [[rdf11-concepts]] section on data types) such as xsd:integer, xsd:double, xsd:String etc. as well as rdfs:Resource.

Complex Data Types

Complex data types MAY be used when a single primitive data type is not sufficient to represent the values of one component. For instance, a measurement value with a unit or an error MAY be expressed by a complex data type.

Data Shapes for Complex Data Types

In ADF-DCO, a complex data type MUST be represented by a shape according to the Shape Constraint Language (SHACL) [[SHACL]]. While SHACL allows to specify very complex graph patterns, ADF-DCO defines the following restrictions on the usage of the SHACL vocabulary for the specification of complex data types:

  • A predicate MUST appear only once.
  • The only valid cardinality restriction is sh:minCount = sh:maxCount = 1.
These constraints are necessary to guarantee unambiguous representations, and natural evaluation of equality of instances of complex data types. Basically a complex data type consists of a set of primitive data types which are organized according to some data shape. The motivation for organizing primitive values in form of a complex data type MAY be different. The main reason however is that the corresponding values are bound together and MAY be retrieved accordingly (e.g. a measurement value with unit and error term).

Example: Quantity Values as Complex Data Types

The Allotrope Foundation Ontologies (AFO) [[AFO]] reuse [[!QUDT]] for representation of quantity values and units. A quantity value, defined according to the [[!QUDT]], is the most prominent example of a complex data type. The following shape describes the structure of a quantity value: Additionally to the numeric value, a unit MUST be specified. This is necessary if measurement values of one data set use different units.

ex:QuantityValueType  a sh:Shape ;
    sh:property [
      sh:predicate qudt:numericValue;
      sh:minCount 1;
      sh:maxCount 1;
      sh:nodeKind sh:Literal;
      sh:datatype xsd:double
    ];
    sh:property [
      sh:predicate qudt:unit;
      sh:minCount 1;
      sh:maxCount 1;
      sh:nodeKind sh:IRI;   	# the specified unit has to be an IRI
      sh:class qudt:Unit; 	# the unit must be an instance of qudt:Unit
    ];
    .
                

The following representation would qualify the ex:QuantityValueType.

ex:MassValue a qudt:QuantityValue ;
    qudt:numericValue "15"^^xsd:double ;
    qudt:unit   qudt-unit:Gram .
                

Nested Complex Data Types

Complex data types can be also nested. For instance, a data type of a weighing result can be specified with tare and net weight - both represented by a complex data type with numeric value, error and a predefined unit:

ex:WeighingResultType  a sh:Shape ;
   sh:property [
      sh:predicate «af-x:tare weight»;
      sh:minCount 1;
      sh:maxCount 1;
      sh:nodeKind sh:Blank;             # the object of 'tare weight' MAY be a blank node
      sh:valueShape ex:MassValueType;   # use nested mass value shape
   ] ;
   sh:property [
      sh:predicate «af-x:net weight»;
      sh:minCount 1;
      sh:maxCount 1;
      sh:nodeKind sh:Blank;
      sh:valueShape ex:MassValueType;   # use nested mass value shape
   ] ;
    .

ex:MassValueType  a sh:Shape ;
    sh:property [
      sh:predicate qudt:numericValue;
      sh:minCount 1;
      sh:maxCount 1;
      sh:nodeKind sh:Literal;
      sh:datatype xsd:double
    ];
    sh:property [
      sh:predicate qudt:standardUncertainty;
      sh:minCount 1;
      sh:maxCount 1;
      sh:nodeKind sh:Literal;
      sh:datatype xsd:double
    ];
    sh:property [
      sh:predicate qudt:unit;
      sh:minCount 1;
      sh:maxCount 1;
      sh:nodeKind sh:IRI;
      sh:hasValue qudt-unit:Gram; 	# the unit MUST be qudt-unit:Gram
    ] .
                

The following result representation would qualify the shape:

ex:WeighingResult a af-r:Result ;
    «af-x:tare weight» [
        qudt:numericValue "25.3332"^^xsd:double ;
        qudt:standardUncertainty "0.2"^^xsd:double ;
        qudt-unit:Gram
    ] ;
    «af-x:net weight» [
        qudt:numericValue "20.219"^^xsd:double ;
        qudt:standardUncertainty "0.2"^^xsd:double ;
        qudt-unit:Gram
    ] ;
                

The following [[SHACL]] property constraints MAY be used to describe complex data types in ADF-DCO: sh:in, sh:datatype, sh:directType, sh:nodeKind, sh:hasValue, sh:class and sh:valueShape

Component Specifications

In [[QB]] each data set (or cube) has an associated data structure definition which defines the components of the cube. These components are either dimensions or measures. Dimension components represent independent variables and identify the observations. Measure components represent dependent variables and store the observation values.

A component specification in ADF DCO has an associated primitive or complex data type.

Additionally to the [[!QB]] concepts, ADF-DCO defines adf-dc:Dimension and adf-dc:Measure as subclasses of qb:ComponentSpecification. In [[!QB]] the distinction of measures and dimensions is implicitly defined by the qb:measure and qb:dimension properties. Further, ADF-DCO defines the annotation property adf-dc:componentDataType which relates a qb:ComponentSpecification with the type of values comprised by the component. Examples are XSD data types such as xsd:integer or xsd:double, and instances of sh:Shape or rdfs:Resource:

ex:SampleDimension a adf-dc:Dimension, qb:ComponentSpecification ;
                   qb:dimension «af-x:measured sample» ;
                   qb:order "1" ;
                   adf-dc:componentDataType rdfs:Resource .

ex:MeasureCount a adf-dc:Measure, qb:ComponentSpecification ;
                qb:measure «af-x:total cell count» ;
                adf-dc:componentDataType xsd:integer .

ex:MeasureWeight a adf-dc:Measure, qb:ComponentSpecification ;
                 qb:measure «af-x:net weight» ;
                 adf-dc:componentDataType ex:QuantityValueType .
			

Scales

A scale is a categorization of types of variables. Scales are essential for a definition of subsetting (definition of ranges), however [[QB]] currently does not support scales.

In ADF-DCO, scales define an additional type for adf-dc:Dimensions and characterize the type of data values for the component. Defining scale types for dimensions is important since this specifies also which operations and selections are possible on the data values of the component. The following figure illustrates the types of scales defined in the ADF Data Cube Ontology:

Class hierarchy of scales defined in ADF-DCO.

Nominal Scale

The nominal scale differentiates between items or subjects based only on their names or (meta-)categories and other qualitative classifications they belong to; thus dichotomous data involves the construction of classifications as well as the classification of items. In general, nominal scales are used always, when the values of a component can be tested only for equality (==, !=) but no 'natural' order can be defined. This is the case for example for measured colors ("red", "blue"...) or for the IRIs of measured samples (continued example):

ex:SampleDimension  a   qb:ComponentSpecification ,
                        adf-dc:Dimension,
                        adf-dc:NominalScale ;
                    qb:dimension «af-r:measured sample» .
				
Numbers may be used to represent the variables on a nominal scale, but the numbers then do not bear any numerical value or relationship.

Ordinal Scale

An ordinal scale is a scale which allows for rank order (1st, 2nd, 3rd, etc. or very good, good, average, bad) by which data can be sorted, but still does not allow for relative degree of difference between them. Ordinal scales are used for example for an index dimension that represents subsequent measurements or a peak list:

ex:IndexDimension a qb:ComponentSpecification ,
                    adf-dc:Dimension ,
                    adf-dc:OrdinalScale ;
                    qb:dimension «af-x:index» .
				

Cardinal Scale

A cardinal scale is a scale where the difference between two values can be measured and its meaning is independent of the absolute values. There are two types of cardinal scales, namely interval scale and ratio scale which are described next.

Interval Scale

An interval scale is a cardinal scale where the ratio between values is not comparable. E.g. temperature with the Celsius scale has an arbitrarily-defined zero point (the freezing point of a particular substance under particular conditions). Ratios are not allowed since 20 °C cannot be said to be 'twice as hot' as 10 °C. Another example would be date, when measured from an arbitrary epoch (such as AD). As for temperatures a multiplication/division cannot be carried out between any two dates directly.

ex:TemperatureDimension a   qb:ComponentSpecification ,
                            adf-dc:Dimension ,
                            adf-dc:IntervalScale ;
                        qb:dimension «af-x:temperature» .
                    
Ratio Scale

A ratio scale is a scale which possesses a meaningful (unique and non-arbitrary) zero value. Most measurements in the physical sciences and engineering are done on ratio scales. Examples include mass, length, duration, plane angle, energy and electric charge. Ratios are allowed because having a non-arbitrary zero point makes it meaningful to say, for example, that one object has "twice the length" of another (= is "twice as long").

ex:MassDimension    a   qb:ComponentSpecification ,
                        adf-dc:Dimension ,
                        adf-dc:RatioScale ;
                    qb:dimension «af-x:net weight» .
                    

Order Functions

An order function specifies the comparison of values for a component specification. Depending on the data type associated with a component specification and the scale type different types of order functions can be specified.

The following figure illustrates the different order functions defined by ADF-DCO:

In ADF-DCO a QB component specification has an associated order function which is either a native, a lexicographic, a quantity value or a complex value order.

For native, lexicographical and quantity value order functions, standard instances are defined. That is, additionally to the classes adf-dc:NativeOrder, adf-dc:LexicographicalOrder and adf-dc:QuantityValueOrder the instances adf-dc:nativeOrder, adf-dc:lexicographicalOrder and adf-dc:quantityValueOrder are defined in ADF-DCO. Thus, only in the complex case order functions have to be specified in detail.

In the following, the different order functions are described in detail and illustrated along examples. The basis for this is a mass measurement result which has three components: index, time and mass measure:

ex:MassMeasurementResult a qb:DataSet;
    qb:structure ex:MassMeasurementResultStructure .

ex:MassMeasurementResultStructure a qb:DataStructureDefinition;
    qb:component ex:IndexDimension;
    qb:component ex:TimeDimension;
    qb:component ex:MassMeasure .
            

An observation of this data set MAY look like this:

ex:obs123 a qb:Observation;
    qb:dataSet ex:MassMeasurementResult;
    «af-x:index» 4;
    «af-x:event duration»  [
        qudt:numericValue '24.02'^^xsd:double ;
        qudt:unit qudt-unit:SecondTime ;
        ];
    ex:complexMassMeasure [
        «af-x:net weight» [
            qudt:numericValue '14.0'^^xsd:double ;
            qudt:standardUncertainty '0.2'^^xsd:double ;
            qudt:unit qudt-unit:Gram
        ];
        «af-x:tare weight» [
            qudt:numericValue '15.0'^^xsd:double ;
            qudt:standardUncertainty '0.8'^^xsd:double ;
            qudt:unit qudt-unit:Gram
        ] ;
    ] .
            

Native Order

A native order is defined as the total order of real numbers or subsets thereof if a component specfication refers to xsd:integer, xsd:decimal, xsd:double, xsd:float. The native order for timestamps and durations (xsd:dateTime, xsd:date, xsd:duration) is in increasing time and the native order of booleans (xsd:boolean) is: false is less than true.

Regarding the example above, the ex:IndexDimension can be associated with a native order as follows:

ex:IndexDimension   a   adf-dc:Dimension, adf-dc:OrdinalScale;
                    adf-dc:orderedBy ex:nativeOrder .
                

The framework MUST support implementation of native orders for all primitive XSD data types.

Lexicographical Order

The lexicographical order is an order function which defines an order of elements by characters. It is defined for example for a component specification with component data type xsd:string.

# index dimension adaptation to the example above, if the integer property «af-x:index» would be replaced by a string identifier property
ex:obs123 a qb:Observation ;
    qb:dataSet ex:MassMeasurementResult ;
    dct:identifier "sample 4".

ex:IndexDimension a adf-dc:Dimension, adf-dc:OrdinalScale ;
    qb:dimension dct:identifier;
    adf-dc:orderedBy adf-dc:lexicographicalOrder .
                

Quantity Value Order

The quantity value order is an order function which is defined for instances of qudt:QuantityValue which describe values with the same underlying quantity kind. This ordering orders the quantitiy values by its numeric value normalized to the SI standard unit, e.g. all length quantity values are first converted internally into meter and then ordered by the normalized numeric value.

Regarding the example above, the ex:TimeDimension can be associated with a quantity value order as follows:

ex:TimeDimension adf-dc:orderedBy adf-dc:quantityValueOrder .
                

The properties defined in the property path MUST be specified by full URIs.

The framework MUST implement the actual comparison functions which tolerate quantity values with different units of the same quantity kind. E.g. a duration specified in minutes is comparable to a duration specified in seconds. The different factors are provided by [[!QUDT]].

Equality of quantity values is also defined by the comparison function. Thus, 60 seconds are considered to be equal to 1 minute, when a adf-dc:quantityValueOrder is specified.

Complex Value Order

A complex value order is an order function which is defined for component specifications with complex data types which are represented by shapes. It consists of a set of items each defining a property path of the shape, an order number and an (sub) order function for the respective values. Thus, a complex value order can be considered as a sorted list of orders specified for property path of a shape. A complex value order is a generalization of the quantity value order.

Complex value orders MAY be nested as shown in the example below.
ex:MassMeasure adf-dc:orderedBy ex:complexResultOrder .

ex:complexResultOrder a adf-dc:ComplexValueOrder;
    adf-dc:hasItem [
        adf-dc:propertyPath "«af-x:net weight»/qudt:numericValue"; # points to a primitive value
        adf-dc:order 1;                             # the net weight numeric value is considered first
        adf-dc:orderedBy adf-dc:nativeOrder ;       # standard reference to native order
    ];
    adf-dc:hasItem [
        adf-dc:propertyPath "«af-x:tare weight»";   # points to a complex value
        adf-dc:order 2;                             # the tare weight is considered second within the complex result order
        adf-dc:orderedBy adf-dc:quantityValueOrder  # reference to an order function, which handles the complex data type.
    ].
                

The second item of ex:complexResultOrder shows how nesting works in order functions.

Subsetting of Data Cubes

ADF-DCO provides comprehensive means to select subsets of data contained in a data cube (qb:DataSet).

QB Slices

The RDF Data Cube Vocabulary [[!QB]] provides the concept of a slice. A slice denotes a subset of a data set, defined by fixing a subset of the dimensional values. That is a qb:Slice allows to restrict selected dimensions to some set of single values.

Data Selections

ADF-DCO introduces the concept of data selections. Data selections can be based on dimensions or measures.

If a selection is (solely) based on the dimensions of the cube, such a selection is called a data slab. The slices defined in QB are a special kind of slabs.

If a selection is based on the measures, it is a projection or a filter. If a subset of multiple measures is selected or parts of the datatypes of the measures are selected via property paths, then the selection is a data projection.

If the selection is based on the observed values, then it is a data filter. While filters and slabs look similar, there is an important difference between them: In the case of dimensions, it is required that each value of the dimension is distinct from another within the same dimension. No duplicate values are possible, otherwise the dimension could never identify the point in the cube. For measures this limitation does not exist.

A data selection is an n-dimensional subset of a data cube where the set of observations (i.e. the measures) are selected based on component selections on dimensions or measures. The following figure illustrates the adf-dc:DataSelection with related classes in QB:

data selection
A data selection is related to data set and defines a selection structure which consists of component selections.

For each dimension component of the data structure definition, the selection structure definition MUST specify a dimension selection. For each measure componenent of the data structure definition, that is part of the selection the selection structure definition MUST specify a measure selection. There MUST be at least one measure selection defined. A data selection allows defining different types of component selections on components such as point or range selections. If no point or range selection is specified, the component selection is assumed to be an unbounded selection, however the type of selection SHOULD be explicitly stated.
The different types of selections are described below.

Since adf-dc:DataSelection provides more possibilities to define selections on dimensions, it extends the concept of qb:Slice. In particular, each slice can be represented as a data selection on dimensions by using point selections.

The running example for the following subsections is a cell counter measurement result, expressed by a data cube with two dimension components (index and sample) and one measurement component (total cell count).

# An observation of the data cube
ex:obs13 a qb:Observation;
   qb:dataSet ex:CellCounterResultSet;
   «af-x:index» 3;
   «af-x:sample» ex:sample1;
   «af-x:total cell count» 98 .

# The data cube has a structure
ex:CellCounterResultSet a qb:DataSet, «af-x:cell counter measurement result»;
   qb:structure ex:CellCountStructure .

# The data structure definition specifies three components
ex:CellCountStructure a qb:DataStructureDefinition;
   qb:component ex:IndexDimension;
   qb:component ex:SampleDimension;
   qb:component ex:CellCountMeasure .

# dimension: index
ex:IndexDimension a qb:ComponentSpecification, adf-dc:Dimension, adf-dc:OrdinalScale ;
   qb:dimension «af-x:index»;
   qb:order 1;
   adf-dc:componentDataType xsd:integer .

# dimension: sample
ex:SampleDimension a qb:ComponentSpecification, adf-dc:Dimension, adf-dc:NominalScale ;
   qb:dimension «af-x:sample»;
   qb:order 2;
   adf-dc:componentDataType rdfs:Resource .

# measure: cell count
ex:CellCountMeasure a qb:ComponentSpecification, adf-dc:Measure, adf-dc:OrdinalScale ;
   qb:measure «af-x:total cell count»;
   adf-dc:componentDataType xsd:integer .
                

Component Selections

A component selection is a specification of a set of values of a single component specification of a data cube. The following figure illustrates the different subclasses of adf-dc:ComponentSelection that further define the kind of selection used:

component selection subclasses

Selections are either defined by list of items (point selection) or through range selections depending on the type of component specification. The other criterion is whether the selection is done on a dimension or on a measure. If the component selection is a selection on a measure it is a measure selection, if it is a selection on a dimension then it is a dimension selection. If a measure selection is not the unbounded selection, then the data selection acts as a filter on the data cube. If not all measures of the data structure definition are selected or the measure selection has defined property paths on the underlying datatype, the data selection acts as projection on the data cube. A data selection needs to specify for each dimension component exactly one component selection - so the data slab part MUST be always fully defined. Based on the data structure definition a adf-dc:DataSelection with a corresponding adf-dc:SelectionStructureDefinition is specified as follows:

# The data selection is a slab
ex:CellCounterSlab a adf-dc:DataSelection;
    adf-dc:dataSelectionOf ex:CellCounterResultSet ;
    qb:structure ex:CellCountSlabStructure .

# The selection structure definition specifies two component selections for the dimension components
# defining the slab and includes the measure component
ex:CellCountSlabStructure a adf-dc:SelectionStructureDefinition;
   adf-dc:selects ex:IndexDimensionSelection;
   adf-dc:selects ex:SampleDimensionSelection;
   adf-dc:selects ex:CellCountMeasureSelection;
                

The kind of scale of the dimension component determines which selections are allowed.

Point selection

A point selection is a component selection on a dimension or measure that selects a set of distinct and named values.

# The sample dimension selection is a point selection
ex:SampleDimensionSelection a adf-dc:PointSelection, adf-dc:DimensionSelection ;
    adf-dc:selectionOn ex:SampleDimension ;
    adf-dc:hasItem ex:sample1, ex:sample2, ex:sample10 .
                    
Unbounded selection

A unbounded selection is a component selection on a dimension or measure which selects all values.

# The cell count measure selection is an unbounded selection
ex:CellCountMeasureSelection a adf-dc:UnboundedSelection, adf-dc:MeasureSelection ;
    adf-dc:selectionOn ex:CellCountMeasure ;
                    
Range Selection with Primitive Data Types

A range selection is a component selection on a component specification which represents an ordinal scale. A range selection MUST define a minimum value and a maximum value when the component data type is primitive. If the component has a complex data type, then maximum and minimum need to be specified by reference to an example complex type according to the shape defined by the component specification.

# The index dimension selection defines a range
ex:IndexDimensionSelection a adf-dc:RangeSelection, adf-dc:DimensionSelection ;
    adf-dc:selectionOn ex:IndexDimension ;
    adf-dc:minimumValue 2;
    adf-dc:maximumValue 10;
                    

It is also possible to define unbounded and right or left bounded range selections.

Range Selection with Complex Data Types

If a dimension component is associated with a complex data type, the minimum and maximum MUST reference complex data values accordingly. For instance a cube may specify a time dimension component with a complex data type as follows:

ex:TimeDimension    a   qb:ComponentSpecification,
                        adf-dc:Dimension,
                        adf-dc:IntervalScale ;
                    qb:dimension «af-x:total retention time»;
                    qb:order 1;
                    adf-dc:componentDataType ex:DurationType .
                    

The duration type has a numeric value and a unit which is specified through a shape:

ex:DurationType a sh:Shape ;
    sh:property [
          sh:predicate qudt:numericValue;
          sh:minCount 1;
          sh:maxCount 1;
          sh:nodeKind sh:Literal ;
          sh:datatype xsd:double ;
        ];
   sh:property [
          sh:predicate qudt:unit;
          sh:minCount 1;
          sh:maxCount 1;
          sh:nodeKind sh:IRI;
          sh:class qudt:TimeUnit  # i.e. allowed values are e.g. qudt-unit:Second etc.
        ];
        .
                    

Accordingly, the range selection on a component with complex data type defines two complex values for minimum and maximum:

ex:TimeDimensionSelection a adf-dc:RangeSelection, adf-dc:DimensionSelection ;
    adf-dc:selectionOn ex:TimeDimension ;
    # from 0.5 seconds
    adf-dc:minimum [
        a qudt:QuantityValue ;
        qudt:numericValue 0.5;
        qudt:unit qudt-unit:Second
    ];
    # up to half an hour
    adf-dc:maximum [
        a qudt:QuantityValue ;
        qudt:numericValue 30;
        qudt:unit qudt-unit:MinuteTime
    ];
                    
It is up to the implementation of the Data Cube API to retrieve all values for the corresponding range.

For quantity values of the same quantity kind (as defined by [[QUDT]]) value comparisons will be directly implemented by the ADF Data Cube API. For complex data types, an order function MUST be specified, which is then referred to during comparison.

Selections on Measure Components

While a selection structure definition MUST provide component selections for all dimension components of the related data structure definition, component selections on measure components MUST be defined on at least one measure component. All measure components not selected are not part of the resulting data selection. The measure selection MUST be also one of point, range, or unbounded selection. The datacube shape library [[AFS-DC]] defines adf-dc:UnboundedSelection as the default, but the RDF data description SHOULD state this explicitly.

Measure selections allow also to define property path, if the data type is complex and defined by a shape. This is another way of building a projection on the data. The next example shows how a projection on the complex datatype ex:DurationType on a time measure can be defined.

ex:TimeMeasure    a   qb:ComponentSpecification,
                        adf-dc:Measure,
                        adf-dc:IntervalScale ;
                    qb:dimension «af-x:total retention time»;
                    qb:order 1;
                    adf-dc:componentDataType ex:DurationType .

ex:TimeMeasureSelection a adf-dc:UnboundedSelection, adf-dc:MeasureSelection; # selection on measure, no filtering on data
                     adf-dc:selectionOn ex:TimeMeasure;
                     adf-dc:propertyPath "qudt:numericValue"; # projection on the numeric value part of the complex data type DurationType
                    

Change History

Version Release Date Remarks
0.4.0 2015-06-18
  • Initial Working Draft version
1.0.0 RC 2015-09-17
  • Complete revision of concepts
1.0.0 2015-09-29
  • Incorporated feedback of reviews
  • Updated versions, dates and document status
1.1.0 RC 2016-03-11
  • Updated versions, dates and document status
  • Added section on number formatting to document conventions
  • Updated Fig. 1
1.1.0 RF 2016-03-31
  • Updated versions, dates and document status
1.1.5 2016-05-13
  • Updated versions and dates
1.2.0 Preview 2016-09-23
  • Updated versions and dates
1.2.0 RC 2016-12-07
  • Updated versions and dates
1.3.0 Preview 2017-03-31
  • Updated versions and dates
  • Updated section 5.5.2 (Data Selections)
1.3.0 RF 2017-06-30
  • Updated versions and dates
  • Adaptations to new business model
  • Minor edits
1.4.3 RC 2018-10-11
  • Updated versions and dates
1.4.5 RF 2018-12-17
  • Updated versions and dates
1.5.0 RC 2019-12-12
  • Updated versions and dates
1.5.0 RF 2020-03-03
  • Updated HDF5 reference link
  • Added overwritten images
1.5.3 RF 2020-11-30
  • Updated broken reference links
  • Updated PURL and DOCS server links to relative links
  • Reformat the document header